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^ Abstract 

^ Given a boolean n by n matrix A we consider arithmetic circuits for computing the 

transformation x ^ Ax over different semirings. Namely, we study three circuit models: 

^ monotone OR-circuits, monotone SUM-circuits (addition of non-negative integers), and 

non-monotone XOR-circuits (addition modulo 2). Our focus is on separating these 

^ models in terms of their circuit complexities. We give three results towards this goal: 

u 

• (1) We prove a direct sum type theorem on the monotone complexity of tensor product 

O matrices. As a corollary, we obtain matrices that admit OR-circuits of size 0(n), 
but require SUM-circuits of size f](n^/^/log^n). 

^ (2) We construct so-called k-uniform matrices that admit XOR-circuits of size 0(n), 

(T^ but require OR-circuits of size Q{ri^/ log^ n). 

^"^ (3) We consider the task of rewriting a given OR-circuit as a XOR-circuit and prove 

Q that any subquadratic-time algorithm for this task violates the strong exponential 

—u time hypothesis. 

O 

^_^ Keywords: 

IJ arithmetic circuits, boolean arithmetic, idempotent arithmetic, monotone separations, 

•^ rewriting 

% 



^This work is an extended version of two preliminary conference abstracts [8, 14]. 



1. Introduction 

A basic question in arithmetic complexity is to determine the minimum size of an 
arithmetic circuit that evaluates a linear map x \-^ Ax. In this work we approach this 
question from the perspective of relative complexity by varying the circuit model while 
keeping the matrix A fixed, with the goal of separating different circuit models. That 
is, our goal is to show the existence of A that admit small circuits in one model but 
have only large circuits in a different model. 

We will focus on boolean arithmetic and the following three circuit models. Our 
circuits consist of either 

1. only V-gates (i.e., boolean sums; rectifier circuits), 

2. only +-gates (i.e., integer addition; cancellation- free circuits), or 

3. only 0-gates (i.e., integer addition mod 2). 

These three types of circuits have been studied extensively in their own right (see 
Section 2), but fairly little is known about their relative powers. 

Each model admits a natural description both from an algebraic and a combinatorial 
perspective. 

Algebraic perspective. In the three models under consideration, each circuit with inputs 
Xi, . . . , x^ and outputs yi, . . . , y^ computes a vector of linear forms 

n 
Vi / ^ ^ij ^j) ^ 1 , . . . , 771 . 

That is, y = Ax^ where A = (a^j) is an m by n boolean matrix with aij G {0, 1} and 
the arithmetic is either 

1. in the boolean semiring ({0, 1}, V, A), 

2. in the semiring of non-negative integers (N, +, O? or 

3. inGF(2). 

As an example, Fig. 1 displays two circuits for computing y = Ax for the same A using 
two different operators; the circuit on the right requires one more gate. 

Combinatorial perspective. A circuit computing y = Ax for a boolean matrix A can 
also be viewed combinatorially: every gate g is associated with a subset of the formal 
variables {xi, . . . , x^}; this set is called the support of g and it is denoted supp(5'). The 
input gates correspond to the singletons {xj}, j = 1, . . . , n, and every non-input gate 
computes either 

1. the set union (V), 

2. the disjoint set union (+), or 

3. the symmetric difference (0) of its children. 
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Figure 1: An V-circuit (left) and a +-circuit (right). 



This way an output gate i/i will have supp(y^) = {xj : a^j = 1}. 

Note the special structure of a +-circuit: there is at most one directed path from 
any input Xj to any output yi. In fact, from this perspective, every +-circuit for A 
is easy to interpret both as an V-circuit for A, and as a 0-circuit for A (equivalently, 
there are onto homomorphisms from (N, +, •) to ({0, 1}, V, A) and GF(2)). In this sense, 
both V- and ©-circuits are at least as efficient as +-circuits. 

Relative complexity. More generally we fix a boolean matrix A and ask how the circuit 
complexity of computing y = Ax depends on the underlying arithmetic. 

To make this quantitative, denote by Cy{A)^ (7+ (A), and C^{A) the minimum 
number of wires in an unbounded fan-in circuit for computing y = Ax in the respective 
models. For simplicity, we restrict our attention to the case of square matrices so that 
m = n. 

For X, Y G {V, +, ©}, we are interested in the complexity ratios 



Gapx/yin) := max Cx(A)/Cy(A). 



Ag{0,1} 

For example, we have that Gap^/^{n) = Gap^/^{n) = 1 and that Gap^/^{n) > 
Gapyi^{n) for all n, by the above fact that each +-circuit can be interpreted as an 
V-circuit and as a ©-circuit. 

We review the motivation for studying separation bounds in Section 2. Next, we 
state our results, which are summarised in Figure 2. 

1.1. Our results 

We begin by studying the monotone complexity of tensor product matrices of the 
form 

A = Bi®B2, 

where ® denotes the usual Kronecker product of matrices. In Section 3, we prove a direct 
sum type theorem on their monotone complexity. As a corollary, we obtain matrices that 
are easy for V-circuits, Cy{A) = 0(n), but hard for +-circuits, Cj^{A) = ll(n^/^/log^n). 
This implies our first separation: 
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Figure 2: Separation bounds. An arrow from Y to X is labelled with Ga'Py^iy{n)\ bounds for (X, Y)- 
Rewrite are given inside square brackets. 
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We are not aware of any prior lower bound techniques that work against +-circuits, 
but not against V-circuits. Hence, as far as we know, Theorem 1 is a first step in this 
direction. 

Next, we separate V- and +-circuits from ©-circuits by considering matrices that 
look locally random in the following sense: 

Definition (/c-uniformity). A random matrix A is called k-uniform if the entries in 
every k x k submatrix have a marginal distribution that is uniform on{0, 1}^^^. 

Equivalently, a matrix is fc-uniform if each of its entries is or 1 with equal probability 
and the entries in every k x k submatrix are mutually independent. 

In Section 4 we construct n^^^^-uniform matrices that are easy for 0-circuits: 

Theorem 2. There are n^^^"^ -uniform matrices A having C^{A) = 0{n). 

These fc-uniform matrices turn out to be difficult to compute using monotone circuits. 
Indeed, as a corollary, we will obtain our second separation: 

Corollary 3. Gapy/^{n)^ Gap^/^{n) = Q{n/log^ n). 

Separations between V- and 0-circuits have also been considered by Sergeev et 
al. [11, 12] who proved the slightly weaker bound Gap^/^{n) = f^(n/(log^nloglogn)). 
Furthermore, Jukna [17] has informed us that the bound in Corollary 3 can actually 
be proved more directly using existing methods [15, 28]. Nevertheless, we hope our 
alternative approach via /c-uniform matrices might be of independent interest — for 
example, in closing the gap between the current lower bound Gapy/^{n) = Q{n/log^ n) 
and the best known upper bound Gapy/^{n) = 0(n/logn); see Section 2. 

As is true in the case of Gap^^^ we conjecture more generally that all the non-trivial 
complexity gaps between the three models are of order n^~^^^\ While we are unable to 
enlarge the gap in Theorem 1, or prove any super-constant lower bounds on Pe/v, our 
final result provides some evidence towards these conjectures. 



In Section 5, we show that if certain V-circuits that are derived from CNF formulas 
could be efficiently rewritten as equivalent +- or 0-circuits, this would imply unexpected 
consequences for exponential-time algorithms. More precisely, we study the following 
problem. 

The (X, Y)-Rewrite problem: On input an X-circuit C, output a Y-circuit that com- 
putes the same matrix as C. 

Both (V, +)-Rewrite and (V, ©)-Rewrite admit simple algorithms that output a circuit 
of size 0(|Cp) in time 0(|Cp). However, we show that any significant improvement on 
these algorithms would give a non-trivial 2(-'^~^^^poly(n,7Tz) time algorithm for deciding 
whether an n-variable m-clause CNF formula is satisfiable — this violates the strong 
exponential time hypothesis [13]: 

Theorem 4. Neither (V, +)-Rewrite nor (V, 0)-Rewrite can be solved in time 0{\C\ ~^) 
for any constant e > 0^ unless the strong exponential time hypothesis fails. 

Theorem 4 provides evidence, e.g., for the conjecture p^/^ = n^"^^^^ in the following 
sense. If there is a family of matrices A witnessing C^{A)/C^{A) = n^~^^^\ then clearly 
no 0(|Cp~^)-time algorithm exists for (V, 0)-Rewrite: if we are given a minimum-size 
V-circuit for A as input, there is no time to write down a legal output. 

Our proof of Theorem 4 shows, in particular, that an 0{\C\ ~^)-time algorithm 
for (V, +)-Rewrite would give an improved algorithm for counting the number of 
satisfying assignments to a given CNF formula (#CNF-SAT). Similarly, an 0{\C\ ~^)- 
time algorithm for (V, 0)-Rewrite would give an improved algorithm for deciding whether 
the number of satisfying assignments is odd (©CNF-SAT). 

1.2. Notation 

A circuit C is a directed acyclic graph where the vertices of in-degree (or fan-in) 
zero are called input gates and all other vertices are called arithmetic gates. One or 
more arithmetic gates are designated as output gates. The size |C| of the circuit is the 
number of edges (or wires) in the circuit. 

We abbreviate [n] := {1, . . . , n}; all our logarithms are to base 2 by default; and we 
write random variables in boldface. 

2. Related work 

Upper bounds. The trivial depth- 1 circuit for a boolean matrix A uses 1^41 wires, where 
we denote by 1^41 the weight of A^ i.e., the number of 1-entries in A. Even though \A\ 
might be of order 9(n^), Lupanov (as presented by Jukna [16, Lemma 1.2]) constructs 
depth-2 circuits (applicable in all the three models) of size 0(n^/logn) for any A. This 
implies the universal upper bound 

Gapx/yin) = 0{n/logn). (Lupanov) 



Lower bounds. Standard counting arguments [16, §1.4] show that most n x n matrices 
have wire complexity Q{n^ /logn) in each of the three models. Combining this with 
Lupanov's upper bound we conclude that a random matrix does little to separate our 
models: 

Fact 1. For a uniformly random A, the ratio Cx{A)/Cy{A) is a constant w.h.p. 

Unsurprisingly, it can also be shown that finding a minimum-size circuit for a 
given matrix is NP-hard in all the models. For V- and +-circuits this follows from 
the NP-completeness of the Ensemble Computation problem as defined by Carey and 
Johnson [10, P09]. For ©-circuits this was proved by Boyar et al. [5]. 

V- circuits. The study of V-circuits (sometimes called rectifier circuits) has been centered 
around finding explicit matrices that are hard for V-circuits. Here, dense rectangle-free 
matrices and their generalisations, (s^t)-free matrices, are a major source of lower 
bounds. 

Definition. A matrix A is called {s^t)-free if it does not contain an (5 + 1) x (t + 1) 
all-1 submatrix. Moreover, A is simply called k-free if it is (fc, /c)-free. 

Nechiporuk [24] and independently Lamagna and Savage [21] constructed the 
first examples of dense 1-free matrices A achieving C^{A) = Q(n^^'^). Subsequently, 
Mehlhorn [22] and Pippenger [26] established the following theorem that gives a general 
template for this type of lower bound; we use it extensively later. 

Theorem 5 (Mehlhorn-Pippenger). If A is {s^t)-free, then C\/{A) > \A\/{st). 

Currently, the best lower bound for an explicit A is obtained by applying Theorem 
5 to a matrix construction of Kollar et al. [19]; the lower bound is C^{A) > n^~^^^^ (see 
also Cashkov and Sergeev [11, §3.2]). 

®- circuits. It is a long-standing open problem to exhibit explicit matrices requiring 
super-linear size ©-circuits. No such lower bounds are known even for log-depth circuits, 
and the only successes are in the case of bounded depth [2, 9], [16, §13.5]. This, together 
with Fact 1, makes it particularly difficult to prove lower bounds on Gap^^y. 

+- circuits. Additive circuits have been studied extensively in the context of the addition 
chain problem (see Knuth [18, §4.6.3] for a survey) and its generalisations [27]. 

In cryptography, as observed by Boyar et al. [5], many heuristics that have been 
proposed for finding small 0-circuits produce, in fact, +-circuits that do not exploit the 
cancellation of variables that is available in CF(2). Thus, the measure Gap^^^ gives a 
lower bound on the approximation ratio achieved by any such minimisation heuristic. 

Algebraic complexity. A particular motivation for studying the separation between V- 
and +-circuits is to understand the complexity of zeta transforms on partial orders [3] . 
Indeed, the characteristic matrix of every partial order < has an V-circuit proportional 
to the number of covering pairs in <, but the existence of small +-circuits (and hence 
fast zeta transforms) is not currently understood satisfactorily. 
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Strong exponential time hypothesis. Theorem 4 is similar to other recent lower bound 
results for polynomial-time solvable problems based on the strong exponential time 
hypothesis [25]. See also [7]. 

3. +/V-Separation 

In this section we give a direct sum type theorem for the monotone complexity of 
tensor product matrices. Using this, we obtain a separation of the form 

C+{B ® A) = Q{N^/y log" N), ^ ^ 

where ® denotes the usual Kronecker product of matrices and N = n^ denotes the 
number of input and output variables. This will prove Theorem 1. 

3. 1 . Tensor products 

As a first example, let yl be a fixed boolean n x n matrix and consider the matrix 
product 

X ^AX, (2) 

where we think of X as a matrix of N = n x n input variables. If we arrange these 
variables into a column vector x by stacking the columns of X on top of one another, 
then (2) becomes 

X ^ {I ®A)x, (3) 

where / is the nx n identity matrix. That is, I ® A is the block matrix having n copies 
of A on the diagonal. 

The transformation (3) famously admits non-trivial 0-circuits due to the fact that 
fast matrix multiplication algorithms can be expressed as small bilinear circuits over 
GF(2). However, it is easy to see that in the case of our monotone models, no non-trivial 
speed-up is possible: any V-circuit for (3) must compute A independently n times: 

Cy{I®A) = n'Cy{A). (4) 

This follows from the observation that two subcircuits corresponding to two different 
columns of X cannot share gates due to monotonicity. 

Our approach. We will generalise the above setting slightly and use tensor products of 
the form B ® Aio separate V- and +-circuits. Analogously to (2), one can check that 
the matrix B ® A corresponds to computing the mapping 

X ^ AXB'^. (5) 

We aim to show that for suitable choices of A and B computing B ® A i^ easy for 
V-circuits but hard for +-circuits. We will choose A to have large complexity (e.g., 



choose A at random) , and think of B as dictating how many independent copies of A a 
circuit must compute. 

More precisely, define rkv(5) and rk+(S) as the minimum r such that B can be 
written as 5 = PQ^ over the boolean semiring or over the semiring of non-negative 
integers, respectively, where P and Q are n x r matrices. Equivalently, rkv(5) (resp., 
rk+(S)) is the minimum number of rectangles (resp., non-overlapping rectangles) that 
are required to cover all 1-entries of B. 

These cover numbers appear often in the study of communication complexity [20]. 
In this context, the matrix B = I — the boolean complement of the identity / — is the 
usual example demonstrating a large gap between the two concepts [20, Example 2.5]: 

rkv(/) = e(logn), 
rk+(7) = n. 

We will use this gap to show that, up to poly logarithmic factors, 

Cv(/®A) ^ rkv(/)•n^ 
C+{I®A) ^ rk+(7).n^ 

In terms of the number of input variables N = n^^ we will obtain (1). 

3.2. Upper bound for \/ -circuits 

Suppose B = PQ^ where P and Q are n x rkv(5) matrices. We can compute (5) as 

{A{XQ))P^, 

which requires 3 matrix multiplications, each involving rkv(S) as one of the dimensions 
(the other dimensions being at most n). 

If these 3 multiplications are naively implemented with an V-circuit of depth 3, each 
layer will contain at most rkv(S)n^ wires so that Cy{B ® ^4) < 3rkv(S)n^. However, 
one can still use Lupanov's techniques to save an additional logarithmic factor: if 
rkv(S) = O(logn), Corollary 1.35 in Jukna [16] can be applied to show that each of 
the three multiplications above can be computed using 0{n^) wires. Thus, ioi B = I 
we get 

Lemma 6. Cy{I ® A) = 0{n^) for all A. D 

3.3. Lower bound for -\--circuits 

Intuitively, since low-rank decompositions are not available for I in the semiring of 
non-negative integers, a +-circuit for 7 ^4 should be forced to compute rk+(7) = n 
independent copies of A. More generally, we ask 

Direct sum question. Do we have C+(5 ® A) > rk+(5) • Cj^{A) for aU A, Bl 



Alas, we can answer this affirmatively only in some special cases. For example, 
the trivial case B = I was discussed above (4), and it is not hard to generalise the 
argument to show that the lower bound holds in case B admits a fooling set of size 
rk+(S). (When B is viewed as an incidence matrix of a bipartite graph, a fooling set is 
a matching no two of whose edges induce a 4-cycle. See [20, §1.3].) However, since this 
will not be the case when 5 = 7, we will settle for the following version, which suffices 
for the separation result. 

Theorem 7. For all {s^t)-free A, 

C+(5®^)>rk+(S).^. (6) 

st 

Note that if we set B = I in Theorem 7 we recover essentially Theorem 5. 

For the purposes of the proof we switch to the combinatorial perspective: For A 
and B we introduce two sets of n formal variables Xa and Xb- Moreover, we let 
Ai^ . . . ^An ^ Xa and Si, . . . , S^ C Xb denote the associated outputs. That is, each 
output Ai is defined by one row of A, and each output Bj is defined by one row of B. 
With this terminology, the input variables foTB<S)A are the pairs in Xa x Xb] we 
think of Xa as indexing the rows and Xb as indexing columns of the variable matrix 
Xa X Xb- Finally, B (^ A corresponds to computing the n^ outputs 

Ai X 5j, for i^j G [n]. 

In the following proof we use the (5, t)-freeness of A to "zoom in" on that layer of 
the circuit which reveals the large wire complexity (similarly to Mehlhorn [22]). We 
advise the reader to first consider the case 5 = t = 1, as this already contains the main 
idea of the proof. 

Theorem 7. Let C be a +-circuit computing B ® A. As a first step, we simplify C by 
allowing input gates to have larger-than-singleton supports. Namely, let F consist of 
those gates of C whose supports are contained in a t-wide row cylinder of the form 
Y X Xb where Y C Xa and |y| < t. We simply declare that all computations done 
by gates in F come for free: we promote a gate in F to an input gate and delete all 
its incoming wires. We continue to denote the modified circuit by C — clearly, these 
modifications only decrease its wire complexity. 

Call a wire that is connected to an input gate an input wire and denote the set 
of input wires by W. The wire complexity lower bound (6) will follow already from 
counting the number |VI^| of input wires. 

For i G [n] denote by Ci the subcircuit of C computing the n outputs Ai x Bj^ j G [n], 
and denote by W{i) the input wires of Ci] we claim that 

\W{^)\>vK{B).^-^. (7) 

Before we prove (7), we note how it implies the theorem. Each input wire w G W 
is feeding into a non-input gate having their support not contained in a t-wide row 



cylinder. Due to (5, t)-freeness of A this means that w can appear only in at most s 
different C^. Thus, the sum ^^ |W^(OI counts w at most s times and, more generally, 
we have 

which implies (6) given (7). 



\Jw{^) >y: 



1=1 



1=1 



Proof of (7). Fix i G [n]. If Ai is empty the claim is trivial. Otherwise fix a variable 
X ^ Ai and consider the structure of Ci when restricted to the variables {x} x Xb- Since 
this set of variables can be naturally identified with Xb by ignoring the first coordinate, 
we can view Ci as computing a copy of B on the variables {x} x Xb- 

Indeed, we define the x- support supp^(t(;) of an input wire w ^W{i) to he the set 
of ?/ G Xb such that the variable (x, y) is contained in the support of w. (The support 
of w is simply the support of the adjacent input gate.) Moreover, we let 

W,{i) := {w G W{i) : supp,(^) ^ 0}. 

Put otherwise, Wx{i) consists of the input wires that are used by Ci in computing a 
copy of B on the variables {x} x Xb- Associate to each w G Wx{i) a rectangle 

Rx{w) := co-supp^(i(;) x supp^(i(;), 

where co-8npp^{w) is the set of j G [n] such that w appears in the subcircuit Cij of Ci 
that computes the output Ai x Bj. Now, the crucial observation is that the collection of 
rectangles {Rx{w) : w G Wx{i)} is a non-overlapping cover of 5, because Ci computes a 
copy of B by taking disjoint unions of the supports {supp^(t(;) : w G Wx{i)}. Therefore, 
we must have that 

\W,{t)\>rk+{B). (8) 

To finish the proof, we note that a single input wire w G W{i)^ being t-wide, can 
only be contained in the sets Wx{i) for at most t different x G Ai. Thus, the sum 
X^x l^x(OI counts w at most t times and, more generally, we have 



\W{i)\ = 



U WA^) > E '*"'*" 



t 

xeAi 



xeAi 

which implies (7) given (8). D 

As will be shortly discussed in Section 4.1, a random matrix A G {0,1}^^^ is 
0(logn)-free and has weight \A\ = 0(n^) w.h.p. Using these facts we obtain the 
following corollary, which, together with Lemma 6, proves Theorem 1. 

Corollary 8. A random A satisfies C^{I ® A) = Q{n^ / log^ n) w.h.p. D 
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4. V/0-Separation 

In this section we use the probabihstic method to construct /c-uniform matrices A 
that, for large enough fc, wih witness the fohowing complexity gap with high probability: 

Ce(A) = 0(n), 
Cy{A) = n{nyiog^n). 

In what follows, all matrix arithmetic will be over F = GF(2). 

4.1. Motivation for k-uniform matrices 

Suppose first that A G F^^^ is a random matrix where each entry is drawn uniformly 
and independently from F. The probability that A fails to be (fc — l)-free can be bounded 
from above by taking the union bound over all possible k x k submat rices: 

2 



Pr [ A is not {k - l)-free] < ( ^ j 2-^\ (9) 



It is easy to check (and well-known in the context of random graphs [4, §11]) that for 
k > 21ogn this quantity tends to as n ^ oc. 

Our key observation here is that the estimate (9) only uses the property that the 
entries in each k x k submatrix of A are mutually independent. Indeed, the above 
analysis holds even when A is only fc-uniform for k > 21ogn. Thus, we have the 
following lemma. 

Lemma 9. If A is k-uniform for k > 21ogn^ then w.h.p.^ 

Cv(A) = ^(nVlog'n). 

Proof. Any 2-uniform matrix A has pairwise independent entries so that |A| = 0(n^) 
w.h.p. by Chebyshev's inequality. On the other hand, the above discussion implies that 
A is 21ogn-free w.h.p. Thus, the claim follows from Theorem 5. 

D 

Corollary 3 is a consequence of Lemma 9 and Theorem 2. Thus, our remaining goal 
in this section is to prove Theorem 2. 

4>2. Proof of Theorem 2 

Let m := 0{y/n). To construct a fc-uniform matrix A we start with an ttz x n matrix 
P that satisfies the following two properties: 

(1) P has linear 0-complexity, C^{P) = 0{n). 

(2) Each set oi k = n^^^^ columns of P are linearly independent. 

Miltersen [23] shows that such P can be obtained as submatrices of certain generating 
matrices of linear codes, e.g., those of Spielman 

11 



Theorem 10 (Miltersen [23, Theorem 1.4]). Let D C F^. There are 0(log|L^|) x n 
matrices P with C^{P) = 0{n) such that the mapping x ^ Px is injective on D. 

Indeed, let L^ C F^ be the set of vectors of Hamming weight at most k. Note that if 
P is injective on D, then it clearly has property (2). We also have that |D| < (n + 1)^ 
and so log \D\ = 0(A;logn). Thus, if we set k := y^/logn, we can apply Theorem 10 
to obtain our desired m x n matrix P. 

We can now define 

A := P^RP, 

where R G F^^^ is a matrix chosen uniformly at random; note that C^{R) < \R\ < 
m^ = 0{n). If we compute A in three stages in the obvious way, we obtain 

Ce(A) < Ce(P^) + C^{R) + C^{P) = 0{n), 

where we used the fact that C^{P'^) = 0{C^{P)) — roughly, this follows from simply 
reversing the direction of the wires in a 0-circuit computing P (see Jukna [16, p. 46]). 
It remains to show that A is /c-uniform. In fact, since our definition of A is a 
generalisation of how fc-wise independent variables are typically constructed [1, §15.2], 
the proof of the following lemma is somewhat routine. 

Lemma 11. A is k -uniform. 

Proof. We need to show that each submatrix A/xj, where I^J^ [n] and |/| = | J| = fc, 
is uniformly distributed in F^^^. Write 

A/xj = Pi RPj, 

where Pk is the submatrix of P consisting of the columns with indices in X C [n] . 

Claim. B := RPj is uniformly distributed in F^^^. 

Proof of Claim. Let Bi = RiPj denote the i-th row of B. The rows S^, i G [m], 
are mutually independent variables, since the variables i?^, i G [m], are. Therefore it 
suffices to show that B^ is uniformly distributed in F-*^^^ for each i G [m]. 

To this end, fix i G [m]; we show that all the outcomes B^ = y where y G F-*^^^ 
are equally likely. For any y ^¥^^^ there is a vector x G F-^^^ with xPj = y since Pj 
has linearly independent columns. Hence RiPj = y iS (Ri — x)Pj = 0. But Ri — x 
is distributed the same as Ri so that Pr [RiPj = ?/] = Pr [RiPj = 0] is independent of 
the choice of ^, as desired. O 

Finally, the same analysis as above demonstrates that Aj^j = Pi^ B is uniformly 
distributed in F^^^ proving the lemma. D 

Remark. Interestingly, Theorem 5 is unable to prove a better lower bound than Cy{A) = 
Q{n^/log^n) for any matrix A. Is it true that for every n^^-^^-uniform A, we have 
that Cv(A) = G(n^/logn) w.h.p.? A positive answer would give the tight bound 
Gap^/^{n) = e(n/logn). 
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5. Rewriting 

In this section we study what would happen if (V, +)-Rewrite or (V, ©)-Rewrite 
could be solved in subquadratic time. Namely, we show that this eventuality would 
contradict the strong exponential time hypothesis. This will prove Theorem 4. As 
discussed in Section 1.1, we interpret this as evidence for our conjectures p+/v = n^~^^^'' 
and pe/v = n^"''^^). 

5.1. Preliminaries 

For purposes of computations, we tacitly assume that |C| > n for any n-input circuit 
C considered in this section. This is to make each C admit a binary representation of 
length 0(|C|) where the O notation hides factors poly logarithmic in n. For concreteness, 
C might be represented as two lists: (i) the list of gates in C, with output gates indicated, 
and (ii) the list of wires in C; both lists are given in topological order, with the input 
wires of each gate forming a consecutive sublist of the list of wires. Whatever the 
encoding, we assume it is efficient enough so that the following property holds. 

Proposition 12. On input an X-circuit C and a vector x, the output C{x) can he 
computed in time 0{\C\) (in the usual RAM model of computation). D 

The following proposition records a similar observation for circuit rewriting. 

Proposition 13. Both (V, +)-Rewrite and (V, 0)-Rewrite can he solved in time 0{\C\ ). 

Proof. Suppose we are given an V-circuit C as input. The matrix A computed by C can 
be easily extracted from C in time 0(|Cp). We then simply output the trivial depth- 1 
+-circuit for A that has size at most n^ <\C\^. D 

5.2. Proof of Theorem 4 

The main technical ingredient in our proof is Lemma 14 below, which states that if 
subquadratic-time rewriting algorithms exist, then certain simple covering problems 
can be solved faster than in a trivial manner. 

In the following we consider set systems defined by Li, . . . , L^ and i?i, . . . , i?^ that 
are (not necessarily distinct) subsets of [m]. We say that (i^j) is a covering pair if 
LjURi = [m]. 

Lemma 14. Suppose we are given sets Li, . . . , L^, i?i, . . . , i?^ C [m] as input. 

(a) // (V, +)-Rewrite can he solved in time 0{\C\ ~^) for some constant e > 0^ then the 
numher of covering pairs can he computed in time 0{{nmY~^). 

(h) If (V, 0)-Rewrite can he solved in time 0{\C\ ~^) for some constant e > 0^ then the 



parity of the numher of covering pairs can he computed in time 0{{nm) 



2-e\ 
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Proof of (a). Let A = (a^j) be an n x n matrix defined by a^j = 1 iff (i^j) is a covering 
pair. We sliow liow to compute 1^41 witliout constructing A explicitly. 

Suppose for a moment that we had a small +-circuit C for ^4. The value |yl| can be 
recovered from the circuit C in time 0(|C|) via the following trick: evaluate C (over the 
integers) on the all-1 vector 1 to obtain ^ = C(l) G N^; but now 

\A\ = I'^Al = l^C(l) = yi + "' + yn- (10) 

Unfortunately, we do not know how to construct a small +-circuit for A. Instead, 
our key observation below will be that the complement matrix A admits an V-circuit 
C^ of size only \C^\ = 0{nm). By assumption, we can then rewrite C^ as a +-circuit C+ 
in time 0{\C^\^~^) = 0[{nmY~^). In particular, the size of the new circuit must also be 

|C+| = d{{nmf-'). 

Analogously to (10) we can then recover \A\ from C^ in time 0(|C^|): 

1^1 =n^-\A\ =n'-l^C+(l). 

Indeed, it remains to describe how to construct C^ for A in time 0{nm). 

Construction. Define a depth-2 circuit C^ follows: The 0-th layer of C^ hosts input 
gates /j, j G [n]; the 1-st layer contains intermediate gates gk^ k G [m]; and the 2-nd 
layer contains output gates r^, i G [n]. Each input gate Ij is connected to gates gk for 
k G [m] \ Lj] similarly, each output gate r^ is connected to gates gk for k G [m] \ Ri. 
To see that C^ computes A note that there is a path from input k to output rj iff 
there is a fc G [m] such that k ^ LiU Rj iff (i, j) is not a covering pair. Note also that 
|C^| < 2nm and that the construction takes time 0{nm). D 

Proof of (h). The proof is the same as above, except we work over GF(2). D 

Next, we reduce #CNF-SAT and ©CNF-SAT to the covering problems in Lemma 14. 
Here we are essentially applying a technique of Williams [30, Theorem 5]. 



Theorem 15. We have the following reductions: 

If (V, +)-Rewrite can he solved in time 0\ 
can he solved in time 2(-^~^/^^^poly(n, m). 



(a) If (V, +)-Rewrite can he solved in time 0{\C\ ^) for some e > 0^ then #CNF-SAT 



(h) If (V, 0)-Rewrite can he solved in time 0{\C\ ^) for some e > 0, then ©CNF-SAT 
can he solved in time 2(-^~^/^^^poly(n, m). 

Proof. Let Lp = {(7i, . . . Cm} be an instance of CNF-SAT over variables Xi, . . . , x^. With- 
out loss of generality (by inserting one variable as necessary), we may assume that n is 
even. Call the variables Xi, . . . , x^/2 M'^ variables and the variables x^/2+15 • • • , ^n right 
variables. 
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For each truth assignment s G {0, 1}^/^ to the left variables, let Lg ^ (f he the set 
of clauses satisfied by s. Similarly, for assignment t G {0, 1}^/^ to the right variables, 
let Rt ^ (f he the set of clauses satisfied by t. Clearly, the compound assignment 
(5,t) to all the variables satisfies (f if and only if LsU Rt = (f. That is, the number 
of satisfying assignments is precisely the number of covering pairs of the set system 
{L5, Rt}^ 5, t G {0, 1}^/^. Thus, both claims follow from Lemma 14. D 

We can now finish the proof of Theorem 4: 

— For (V, +)-Rewrite the result follows immediately from Theorem 15. 

— For (V, 0)-Rewrite we need to make the following additional argument. As 
discussed by Cygan et al. [7] the fc-CNF Isolation Lemma of Calabro et al. [6] 
can be applied to show that any 2(^~^^^poly(n, m) time algorithm for ©CNF-SAT 
can be turned into an 2^^~^ ^^ poly(n, m) time Monte Carlo algorithm for CNF-SAT 
where e' > 0. Recognising this, the result follows from Theorem 15. 
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